Local to Global Normalization Dynamic by Nonlinear Local Interactions 
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Here, I present a novel method for normalizing a finite set of numbers, which is studied by 
the domain of biological vision. Normalizing in this context means searching the maximum and 
minimum number in a set and then rescaling all numbers such that they fit into a numerical interval. 
My method computes the minimum and maximum number by two pseudo-diffusion processes in 
separate diffusion layers. Activity of these layers feed into a third layer for performing the rescaling 
operation. The dynamic of the network is richer than merely performing a rescaling of its input, 
and reveals phenomena like contrast detection, contrast enhancement, and a transient compression 
of the numerical range of the input. Apart from presenting computer simulations, some properties 
of the diffusion operators and the network are analyzed mathematically. Furthermore, a method is 
proposed for to freeze the model's state when adaptation is observed. 
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I. INTRODUCTION 



What is the difference between adaptation and nor- 
malization? Are these just two distinct processes, or can 
they be related? The purpose of this paper is to develop 
a model whose dynamic smoothly proceeds from local 
adaptation to global normalization. Mathematical prop- 
erties of the model are analyzed, and its dynamical prop- 
erties are evaluated with luminance images. I study the 
model within the framework of biological vision, where 
emphasis is laid on understanding the emergence of adap- 
tation within the model's dynamic. Finally, a method is 
proposed for freezing the dynamic at the moment when 
adaptation occurs. But to begin with, I briefly describe 
how adaptation and normalization contribute to infor- 
mation processing in the brain. 

Adaptation refers to the adjustment of a sense organ 
to the intensity or quality of stimulation (lj. There is 
agreement that adaptation is important for the function 
of nervous systems, since without corresponding mecha- 
nisms any given neuron with its limited dynamic range 
would stay silent or operate in saturation most of the time 
0. When considering a population of cells (e.g. formal 
processing units or biological neurons), then adaptation 
is usually understood as a locally acting process, which 
can be carried out independently for individual cells or 
groups of cells, respectively (e.g., individual photorecep- 
tors [!, 0, [f| vs. groups of photoreceptors [1, 0]). Thus, 
adaptation refers to sensitivity adjustment of output sig- 
nals as a function of input signals. 

Normalization on the other hand usually refers to estab- 
lishing standardized conditions for one or more qualities. 
For example, at some stage in the brain, the retinal image 
may have been normalized with respect to illumination 
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conditions, such that each face or object is represented to 
have similar illumination patterns, and subsequent recog- 
nition stages work in a more robust fashion. Or, once a 
face image has been detected by an artificial face recog- 
nition system, it can be normalized with respect to head 
tilt or head rotation. In this way a standardized can- 
didate face is obtained, which facilitates matching it to 
other standardized faces from a database. 
Normalization is also used for describing the establish- 
ment of standardized conditions for a population of neu- 
rons. In this context, normalization processes usually act 
as gain control mechanisms. For instance, Grossberg Q 
proposed "shunting competitive networks" (in his terms) 
for accurate signal processing in the presence of noise 
to avoid the noise-saturation dilemma. Because neurons 
have a fixed input range, weak signals get masked by 
noise, and neurons' signal only the noisy fluctuations in 
the input signal. On the other hand, strong signals cause 
neurons to saturate, and any variations within the in- 
put cannot be distinguished. Shunting networks imple- 
ment the multiplicative relationship between membrane 
voltages of neurons and conductance changes that are 
caused by network input on the one hand and signals on 
the other. This multiplicative relationship acts as a gain 
control mechanism that enables these networks to auto- 
matically re-tune their sensitivity in response to fluctuat- 
ing background inputs. As Grossberg demonstrated @|, 
such networks exhibit a normalization property in the 
sense that the total (or pooled) activity of all neurons 
is independent of the number of neurons. Along these 
lines, Heeger and co-workers proposed a normalization 
model to account for the observed non-linearities with 
the cortical simple cell responses, such as response satu- 
ration and cross-orientation inhibition lid ll~fl | . Sim- 
ilar to Grossberg's "shunting competitive networks" , in 
Heeger's model a neuron's output activity is adjusted by 
the pooled activity of a population of many other neu- 
rons ( "normalization pool" ) . This "normalization pool" 
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FIG. 1: Possible network structures for extracting minimum or maximum activities of cells. Each of the networks 
shown in this figure are supposed to select the maximum (or minimum) activity value among the leftmost units. The selected 
maximum (or minimum) is available at the rightmost unit, (a) Two-layer network with global connectivity pattern, (b) Three- 
layer network which extracts in its second layer the (local) maximum of the units to which it is connected. The rightmost unit 
subsequently selects from these local maximum the global one. 



exerts divisive inhibition on the response of a target neu- 
ron, and in this way it acts as a gain control mechanism 
for that cell. 

The circuit models proposed by Grossberg and Heeger de- 
scribe how responses of a group of neurons can be normal- 
ized. Both methods rely on the interaction of some target 
neuron with a number of surrounding neurons. The inter- 
action is brought about by hard- wiring the target neurons 
with surrounding neurons. In contrast, the normalization 
scheme introduced in this paper is based on diffusion 
mechanisms, and thus interactions only take place be- 
tween adjacent cells. Specifically, within the scope of the 
present paper, normalization is understood as mapping a 
set of numbers with finite but in principle arbitrary nu- 
merical range onto a fixed target range (below we will see 
that non-trivial features like contrast enhancement and 
adaptation phenomena emerge from a network which im- 
plements this normalization mechanism). 
Whereas in Grossberg's scheme the normalization pro- 
cess renders the total activity of a group of cells inde- 
pendent of the number of cells ((8J), with my definition 
of normalization it is clear that in most cases the ac- 
tivity summed over all cells will depend on their num- 
ber. A further difference concerns the implementation of 
activity bounds. In Grossberg's scheme, reversal poten- 
tials establish an upper (lower) bound on the activity of 
each cell which can be reached by excitation (inhibition) . 
However, the highest activity value of the normalized cell 
population depends on the activity of all other cells (as 
the total activity is constant). In other words, one cannot 
rely on the presence of distinguished activity values as it 
is the case in my approach. In a normalized population 
of my approach there is always at least one cell which 
has zero activity, and at least one cell with activity one. 
The usual proceeding for normalizing a set of numbers 
can be subdivided into two successive stages. First, the 
maximum and the minimum members of the set are de- 
termined. These two values are then used in a second 
stage for re-scaling all set elements such that after re- 
scaling the elements fall into a pre-defined numerical in- 
terval (or numerical range). 



If we wish to design a corresponding algorithm for the 
first stage of the just described process (i.e. finding the 
maximum and the minimum), we would have to em- 
ploy two memories for storing the current (i.e., a lo- 
cal) maximum and minimum, and compare these val- 
ues successively with all remaining set elements. After 
we finished with comparing, the memory would contain 
the global maximum and minimum. Because every set 
member has to interact explicitly with the memories, the 
whole process is said to involve global operations. The 
global nature is mirrored in the connection structure of 
a correspondingly designed network. Figure [lja) shows 
a schematic drawing of such a network, where one dis- 
tinguished network unit shares connections with all the 
others. This unit is supposed to represent the maximum 
(or minimum) activity value of the set of units to which it 
is connected to. Due to its global connectivity, however, 
our network seems not to be a very plausible candidate 
for a "biologically" model, because (biological) neurons 
are known to interact in a more local fashion. This im- 
plausibility can be relaxed by proposing an alternative 
connectivity pattern (figure [T^b)) . 

Nevertheless, the two units representing the maximum 
and the minimum, respectively, need to interact subse- 
quently again with the input units, in order to put into 
effect the re-scaling operation that implements the gain 
control mechanism. This means that one would require 
yet another set of non-local connections, analogously to 
the pattern shown in figure[TJ This led me to the question 
whether such normalization can be achieved in a more 
"biological" or local fashion, or even by employing only 
interactions between adjacent network units. Presuming 
the existence of corresponding mechanisms, one has to 
explore in addition whether they could, in principle, be 
carried out by nerve cells in a biophysically plausible way. 
Below I present a network (the dynamical normalization 
network), which achieves normalization by means of lat- 
eral propagation of activity between adjacent network 
cells. To this end, parameterized diffusion operators were 
developed. In their limit cases, these operators imple- 
ment non-linear and non-conservative diffusion processes 
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("pseudo-diffusion"). The dynamic of pseudo-diffusion 
proceeds from local to global in a continuous fashion, 
without utilizing any connectivity structure apart from 
coupling among nearest neighbors. 

The dynamic normalization network consists of a total of 
four layers: an input layer, two diffusion layers, and the 
normalization or output layer, where all layers interact. 
Numerical simulations with luminance images revealed 
that the dynamic of the normalization layer is function- 
ally more rich than just performing a re-scaling of its 
input. Initially, the dynamic reveals contrast enhance- 
ment similar to high-pass filtering. 

Furthermore, under certain conditions, an adaptation 
phenomenon ("dynamic compression") can be observed 
in the initial phase of the dynamic. As it is described in 
detail below (section IIIIBI) , the strength of the dynamic 
compression effect depends on the size of high activity 
regions in the input, and their relative positions with re- 
spect to other local maxima. 



II. FORMAL DESCRIPTION OF NONLINEAR 
DIFFUSION 

The dynamic normalization network is based on non- 
linear diffusion operators. In order to proof some of their 
properties, it is necessary that the nonlinear diffusion op- 
erators are differentiable. Accordingly, we define at first 
an operator T\[-] which is parameterized over A as 



Tx[x) 



TjX 



1 



-X.r 



(i) 



where r\ is a normalization constant that is defined as 



77=1 



,-IAI 



(2) 



Through the specific choice of A, we can "steer" the op- 
erator T\[-] continuously from linearity (To = T\ = o) 



A = ^> 7q [x] = x 



(3) 



to half wave rectification (i.e. selection of the maximum 
between zero and its argument) 



lim T\[x] = max(x,0) 

A — > + oo 



(4) 



or inverse half wave rectification (i.e. selection of the 
minimum between zero and its argument) 



lim T\[x] = min(0,:r). 

A — > — oo 

Notice that the operator satisfies T_oo[— 



(5) 



-T +00 \x] 



A. Spatially continuous nonlinear diffusion 
equation in one dimension 

With the operator T\[-], one can define a general dif- 
fusion scheme which contains heat-diffusion as a special 



case for A = 0. To this end, consider, without loss of 
generality, the general form of a diffusion equation for a 
quantity f(x,t) (here referred to as "activity") 



df(x) 
at 



dx 



D(x) 



df 
dx 



(6) 



where D(x) > is the diffusion coefficient. If D[x) de- 
pends on x, then the last equation describes a nonlinear 
diffusion process, otherwise ordinary heat diffusion. Con- 
sequently, by applying the operator T\ [•] on the gradients, 
the following pseudo-diffusion process is obtained (which 
reduces to heat diffusion for A = 0): 



df{x) d 



dt 



dx 



D(x)T x 



df 

Ox 



(7) 



By defining z(x) = df(x)/dx and differencing we obtain 

m*)_m*) Tx[x]+D{x) ^w (8) 



at 



dx 



dz dx 



If D(x) = D = const., the last equation reduces to 



df(x) 
dt 



= D 



dT x [z] d 2 f(x) 



dx 2 



(9) 



The last equation looks in fact like an ordinary diffu- 
sion equation if we consider the factor D dT\[z]/dz as 
an "effective diffusion coefficient". But which effect has 
the derivative dT\[z]/dzl In appendix iBl it is shown 
that it approximates a Heaviside (or step) function H 
for limA^+oo, that is 



lim 



dT x [z] 
dz 



H(z). 



(10) 



In analogy to the previous case it can be shown that 



lim —5 — ~-H(z). 

A^-oo OZ 



(11) 



For a given cell, A specifies the ratio between negative 
and positive influx into the cell from its neighbors. Con- 
sider the case A — > oo for a cell at position x. If the 
activity of any adjacent cell is higher, then the gradi- 
ent z(x) = df(x)/dx will be positive and an influx of 
activity to cell x takes place, because in equation [8] the 
pseudo-diffusion term dz(x)/dx is multiplied by one as 
a consequence of equation 1101 Equation [TO] also implies 
that any negative gradient at x will make the pseudo- 
diffusion term be multiplied by zero, and thus prevents 
an influx of negative activity into cell x. The essence of 
this mechanism is that activity at x can only increase 
until any gradient has dissipated. As an alternative, this 
mechanism may be understood as an auto-adaptive dif- 
fusion constant which regulates its value according to the 
current gradient at x (figure [5]). 

For A — > — oo the mechanism works just vice versa, and 
the activity for a cell at position x may only decrease. 
The linear (or heat) diffusion equation is obtained for 
A = 0, where both a positive-valued and a negative- 
valued influx can enter the cell. 



4 



default: u =1,v =0, At=0.1,t =500 

max 



v(t): 



(a) 



1 

c 

a> 0.9 
o 

IE 0.8 
»*— 
0) 

O 0.7 

u 

C 0.6 

o 

'</> 0.5 
3 

£ 0.4 

■a 

<D 0.3 
> 

'.5 0.2 
O 

* 0.1 
0) 


10 



1 

0.95 

<D 

.2 0.9 
CO 



. ^ .^^^ .>s^ 






_u =2,v =0 (A=2) 




default 


\VV\ 


u =1,v =0.5(A=0.5) 


WW 


U„=1, v„=0.75 (A=0.25) 




t max =50000, A t=0.001 


WW 

- y v 

\ v \ \ 








increasing A 





10 

I 



default: u =1,v =0, At=0.1,t =500 

max 



(b) 



O 0.8 

JS 0.75 

re 

W 0.7 

0.65 

re 

<D 0.6 



0.5 
10 













default 
_u =1,v =0.5 (A=0.5) 
_u =1,v =0.75 (A=0.25) 

, , 



10 

1. 



FIG. 2: Diffusion for < A < oo. (a) The plot relates A of 
equation (ordinate) to the effective diffusion constant 7 of 
equation [13] (abscissa). The different curves relate to different 
simulation parameters as indicated in the legend. Parameters 
that do not appear in the legend correspond to default values 
as indicated in the figure heading (At = 0.1 integration step 
size, i m ax = 500 iteration limit). An increase of the value of 
the gradient A = no — vo at t = makes the curves 7(A) shift 
to the left (arrow). With At = 0.1 or smaller, results do not 
depend in a significant way on integration step size (dashed 
line which overlaps with the curve for the default case) . With 
increasing increments At, however, all solid lines displace to 
the left by the same amount, (b) The steady-state cell values 
tioo = ^00 as a function of A show a sigmoidal relationship 
which smoothly passes from heat diffusion (lie = [uo + vo]/2 
for A — 0) to implementing a maximum operation (u^ 
' for A = 00). 



On 

dv 
di 



= r x [ v - 

= T x [u 



(12) 



7(6 -a) 



(13) 



Furthermore, we define the following surrogate system 
da 

di 
db 

di = 

with a diffusion coefficient 7. Note that because diffu- 
sion coefficients arc different for a(t) and b(t) (that is, 7 
and 1, respectively), the last equation implements a non- 
linear diffusion system. Without loss of generality, we 
assume A > 0, and uq — vo > at t — 0. Furthermore, 
let both diffusion systems have the same initial condi- 
tions uq — ao and bo = vq. With this configuration of 
parameters, the influx into cell u is negative, and will be 
attenuated because of A > 0. Dependent on the precise 
value of A, the steady-state values of Uoo and Voo will be 
situated somewhere between (uo + vq)/2 for A = 0, or 
max(wo,^o) for A — > +00. Now, to understand the be- 
havior for < A < 00, the diffusion coefficient 7 is (nu- 
merically) determined such that both diffusion systems 
(equations [T^] and I13p have the same equilibrium state, 
that is Uoo = floo and = boo (and also Uqo ~ Voo). 
With the assumptions A > and u — v > 0, it fol- 
lows that 7 < 1, because in order to obtain the same 
steady-state values for both diffusion systems, the nega- 
tive influx into cell a needs to be attenuated. Figure [2]a 
shows that in this case the effective diffusion coefficient 7 
and A have a sigmoidal relationship. The sigmoid shifts 
to the left as a function of A = uq — vq (or equivalently 
a - b ). 

Figure shows that steady-state values as a function 
of A also follow a sigmoidal relationship. Cell values at 
convergence smoothly pass from heat diffusion (uoo = 
[uq + vq]/2 for A = 0) to implementing a maximum oper- 
ation (uoo = max[uo,«o] for A — 00). Analogous consid- 
erations hold for negative values of A. 



C. Spatially discrete pseudo-diffusion equation in 
two dimensions 



B. Intermediate values of A for a two cell system 



Intermediate values of A attenuate either negative (A > 
0) or positive influx (A < 0). The amount of attenuation 
depends on A. To illustrate, consider a simplified pseudo- 
diffusion system which consists only of two cells u(t) and 



Based on a centered finite difference representation of 
the Laplacian operator, we define a parameterized diffu- 
sion operator acting on a function f(x,y) as 

lC x f(x,y) = T x [f(x + l,y)-f(x,y)] 

+ T x [f(x-l,y)-f(x,y)} 

+ T x [f(x,y + l)-f(x,y)] 

+ T x [f(x,y-l)-f(x,y)] 

where a grid spacing of Aa; = Ay = 1 is assumed. We 
will make use of the following compact notation 



(14) 



lim 1C X 

A — >+oo 



(15) 



5 



Peppers, 128x 128 pixels 




FIG. 3: Pseudo-diffusion converges faster than heat diffusion (Peppers image). Curves show the temporal evolution of 
the mean activity together with standard deviations for heat diffusion "A = 0" (no symbols, eq. I17|l . max-diffusion "A = +00" 
(circles, eq. I2UJ1 . and min-diffusion "A = — 00 "(triangles, eq. I19p . For computing the mean activity, averaging took place over 
all values of the respective (pseudo-) diffusion layer. The Peppers image (0 < Sy < 1, size 128 x 128 pixels, inset) defined the 
initial state of each layer, (a) Mean activity remains constant with time with heat diffusion (heat diffusion is conservative), but 
approaches the minimum (maximum) value of s in the min-diffusion layer (max-diffusion layer) . (b) The minimum (maximum) 
is finally adopted by all cells ay (&y), as indicated by decreasing standard deviations. Compared to heat diffusion, the pseudo- 
diffusion systems converge in fewer simulation time steps to an uniform state, but for their simulation more computations 
per time step are needed (cf. equation [l} . Moreover, pseudo- diffusion does not converge after 2 • 128 iterations (the largest 
distance between two cells on a TV x N grid is 2N in a Manhattan architecture). A single iteration is insufficient to propagate 
a maximum from one cell to the next, as all diffusion operators are normalized by the number of adjacent cells (in addition, 
DAt < 1/2, see section|X]|. Note further the that standard deviation of the max-diffusion layer has a local maximum (arrow). 
This at first sight paradoxical effect is explained with figure [3] 



and 



(figure [3}, that is 



/C-oo = lim K\. 

A — > — 00 



(16) 



Note that K,\ = o « V 2 from equation [3l 
In order to formulate a spatially discrete pseudo-diffusion 
scheme, we consider a diffusion layer (i.e. a finite grid on 
which diffusion takes place) with an equal number N of 
rows i and columns j, that is 1 < i,j < N. We use a 
discrete-in-space and continuous-in-time notation, where 
fij = f(j, i), fi,j+i = f(j + 1> i) and so on 12|. With the 
above definitions, heat diffusion is described as: 



dt 



= D-K f ij (t) 



(17) 



where D = const, is the diffusion coefficient. The pro- 
cess is assumed to start at time t = to with the initial 
condition fij (to) = sy. From now on we assume that the 
Sij represent an intensity or luminance distribution (i.e., 
"s represents a gray level image). Since diffusion takes 
place in a bounded domain (i.e. we have a finite number 
N x N of grid points), and we also use adiabatic bound- 
ary conditions (i.e. there is neither inward flux nor any 
flux outward over the domain boundary, i.e. dfij/dt = 
for G {(t.O), (i,N), (0, j), (N,j)}) 0, the total ac- 
tivity described by equation [T7] does not depend on time 



N 

E 



fij(t) = const. 



(18) 



The last expression expresses that diffusion is conserva- 
tive - activity is neither created nor destroyed. Although 
the 2-D heat diffusion equation cannot create new activ- 
ity levels which have not already been present at time to 
|14| , it can create extrema in activity domains that have 
a dimension greater than one (cf. [15| . p. 532). A min- 
diffusion layer will eventually compute the minimum of 
all values and is defined as: 



day 
~dt 



= D ■ /C-oofflijC*) 



(19) 



A max-diffusion layer will eventually compute the max- 
imum of all values and is defined as: 



db 



~dt~ ~ ^ ^ +CK "'■> 



bij(t). 



(20) 



We assume equal initial conditions ay (to) — ^y(^o) — s ij 
for the min-diffusion layer and the max-diffusion layer at 
time t — to. 

Whereas equation[l7]preserves its total activity, the min- 
diffusion layer and max-diffusion layer, respectively, do 
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FIG. 4: Why standard deviations can increase with pseudo-diffusion. Standard deviations of the min-diffusion layer 
and the max-diffusion layer reach a local or global maximum if at some time in the layers a configuration is obtained which 
consists of two domains with each domain having a different activity level (similar to a luminance step) . Of course, a necessary 
precondition is that the initially provided configuration (here three domains with gray Sij — 0.5, black Sij — 0, and white 
Sij — 0, see inset) has a smaller standard deviation than the "step"-like configuration which is generated as an intermediate 
state. In the max-diffusion layer, for example, a step-like configuration is reached as soon as the black frame is dissolved from 
both sides (i.e., "eaten" by the gray and the white region). The effect depends on luminance levels and the relative amount of 
black, white, and gray. It can be also obtained for different layouts of the regions (e.g., gray-white-black or black-gray- white). 



not. The total activity of the min-diffusion layer de- 
creases with time and converges to (figure [3]) 

N 

lim V* <Hj(t) = N 2 min{s«}. (21) 

t^oc — ' i.j 
i,3 

The total activity of the max-diffusion layer increases 
with time and converges to (figure [3]) 

N 

lim V bij(t) = N 2 max{s„ }. (22) 

i,j 

In other words, all cells a,j of the min-diffusion layer will 
finally contain the global minimum of the input 

A := min{sjj} = lim a.y(i) Vz, j (23) 

i : j t — >oo 

and all cells fry of the max-diffusion layer will end up 
with the global maximum 

B := max{si,} = lim bij(t) Vi, j. (24) 

This can be explained as follows. A cell a,j of the min- 
diffusion layer may only decrease its activity from one 
time step to the next, until any activity gradient be- 
tween ay and its nearest neighbors has dissipated. As 
a consequence, a^ adopts the minimum activity value 
of the neighborhood, including itself. Because the last 
arguments apply to all cells a%j, eventually all cells will 
adopt the minimum activity min^j^Sjj} at convergence. 
Convergence occurs if a,j = a/rfV (i,j),(k,l) (i.e. when 



no more activity gradient exists). The dynamic of the 
process is illustrated by figure O 

In an analogous way, in the diffusion process described by 
the max-diffusion layer, all cells bij could only increase 
their activity, given the existence of any activity gradient. 
If any cell has a maximum activity value, then finally all 
cells will adopt this maximum, since only then all gradi- 
ents have dissipated. 

Hence, both nonlinear diffusion systems are non- 
conservative, because they do not fulfill requirements 
analogous to equation [T51 



III. DYNAMIC NORMALIZATION BY NEXT 
NEIGHBOR INTERACTIONS 

Equipped with the pseudo-diffusion operators defined 
in the last section, we are now ready to define the dy- 
namic normalization network. The network normalizes a 
given input Sjj with respect to numerical range, but with- 
out taking resort to any global memory for determining 
the minimum and maximum. Rather, the global mini- 
mum and maximum are computed in the min-diffusion 
layer and the max-diffusion layer, respectively, by only 
exchanging information between adjacent cells. 
We start with the following linear scaling scheme, which 
is typically used for normalizing a fixed set of numbers 
(again, see introduction): 

''" IT — ~ where 1 ^*'-?'^ Ar - ( 25 ) 



7 



1 10 25 50 100 500 1000 




FIG. 5: Snapshots of diffusion states. Images show snapshots of max-diffusion (aij, first row), heat diffusion (fij, middle 
row), and min-diffusion (bij, last row) for the Peppers image (size 128x128 pixels). The numbers indicate elapsed iterations. 
Whereas heat diffusion just blurs the image, min-diffusion and max-diffusion create "islands" corresponding to local minima 
and maxima, respectively. With increasing time, islands decrease in number and increase in size, until eventually a single island 
occupies the whole region. Then, the min-diffusion layer and the max-diffusion layer have eventually computed the global 
minimum and maximum, respectively. Due to our boundary conditions (see methods), diffusion is faster at domain boundaries 
(see section fA}. Brighter gray levels correspond to higher cell activities. 



Because of equation[23]and[24]the variable cy will contain 
(after a sufficiently long time) a normalized representa- 
tion of Sij , that is 



Sij e [A, B] 



G [0, 1] 



(26) 



(A and B are the global minimum and maximum, respec- 
tively, of {s^}). To arrive at a fully dynamical system, 
we formally interpret equation 1251 as the steady-state so- 
lution of 

dc ' ' 

—^L = bij(0 - Cij) - Qij(l - Cij) + (27) 

which shall be called dynamic normalization. Notice that 
by using dynamic normalization we naturally avoid the 
singularity of equation [25] that occurs for fey = ay . 

Figure [6] visualizes the state of equation [27] at different 
time steps. Initially, the dynamic normalization process 
is similar to high-pass filtering (figure [7]), what can be 
explained as follows. Contrasts are abrupt changes in 
luminance. Consider a luminance change from dark to 
bright. Then, the dark side has a local minimum, and 
the bright side a local maximum, which propagates in the 
min-diffusion layer and the max-diffusion layer, respec- 
tively (figure [5]) . When the local minimum (maximum) 
has propagated to the position of the bright (dark) side 
of the step, then the bright (dark) side will be normalized 
to one (zero). As the dynamic continues to evolve, local 
maxima and minima propagate further, thereby "eating" 
(i.e., annihilating) other smaller local maxima and min- 
ima. In figure [6] this annihilation of local maxima and 
minima, respectively, is visible through a gradual filling- 
in of image structures from the boundaries. A normal- 
ized version of the original image is finally obtained when 



dcij/dt — 0. Depending on (i) how small the Sjj are, and 
(ii) the choice of integration step size At, the steady-state 
of cy can be reached with delay compared to the steady- 
states of ay and fey, respectively. This is now examined 
in more detail. 



A. Time to convergence for dynamic normalization 

Figure [5] shows the relationship between the time to 
convergence and the numerical range of the input : the 
smaller the sy, the more iterations are necessary to ac- 
complish the mapping expressed by equation [26] Math- 
ematically, this can be seen as follows. Assume that a 
general solution of equation [27] has the form 



cy(t) = C e-^ + d 



(28) 



where C'o and C\ are constants which are defined by the 
initial conditions, and r is a time constant. Plugging the 
last equation into equation [27] yields 



C e~ t/T 
By identifying 

we obtain 



bij ^_ 



— —C\ (bij — dij) + Sij — aij 

(29) 



= Tij(t) = 



1 



' yW ~ blJ (i) - ai j(t) 



Ci = 



(30) 



(31) 



8 



1 10 25 50 100 500 1000 




FIG. 6: Snapshots of dynamic normalization. The images show equation [27] at different time steps (indicated by the 
numbers) for images (size 128x128 pixels) Lena (top row) and Peppers (bottom row). The dynamic begins similar to high-pass 
filtering (cf. figure EJ, proceeds with contrast enhancement and "fills in" image structures from their contrast contours, until 
finally a normalized version of the input image is obtained. Brighter gray levels indicate higher cell activities. In order to 
improve visualization, images were rescaled individually. 



grating, 256 x 256 pixels 




iterations 



cycles/image 



FIG. 7: Spatial frequency vs. time for a sine wave grat- 
ing. This experiment is analogous to figure [HJb), but here for 
a sine wave grating (size 256x256 pixels). At each time, the 
graphs show the maximum activity value of equation 1271 Ob- 
viously, at a fixed number of iterations (;$ 64), the dynamic 
normalization network's signal transmission characteristics is 
high-pass. No frequency selectivity is observed after conver- 
gence (gl 128 iterations). 



and from the initial condition Cy (t = 0) = Vi, j we fur- 
thermore get Cq = — Ci, which finally gives the solution 



(32) 



On grounds of the definition of r fequation l3"0|) we obtain 
two insights. 

First, since the time constant r of the dynamic normal- 
ization process is a function of both fly(t) and bij(t), it 
is not really a constant, but rather depends on time and 
space because of equation [19] and (20] respectively. How- 
ever, r can be approximated by recalling that a,ij(t) and 



bij (t) converge in time and space to the global minimum 
A and global maximum B, respectively, of the input sy 
(equation [23] and [M]). Thus, r w 1/(B - A). This leads 
to the second insight: the smaller are A and B, the longer 
it takes dynamic normalization to converge to a steady 
state. Or, otherwise expressed, the smaller r is, the faster 
the system converges. 

Notice that when using the steady-state solution (equa- 
tion [25]) of dynamic normalization instead of the full dy- 
namic process (equation [27]) . no dependency on input 
contrast is revealed, and the dependence on spatial fre- 
quency structure of the input is much weaker. 



B. Transient adaptation or dynamic compression 

The dynamic normalization layer reveals distinct dy- 
namic phases. In the initial phase, image contrasts are 
extracted. Contrast enhancement occurs in a subsequent 
phase. In the final phase, the activity distribution in the 
dynamic normalization layer is just a re-scaled version 
of the input. In a second phase between the initial and 
the final phase, one observes adaptation: image struc- 
tures with substantially different light intensities in the 
input are mapped to a smaller range of activities in the 
dynamic normalization layer. This effect is the dynamic 
range compression. For its illustration an input image 
was subdivided into four quadrants ( "contrast tiles" , fig- 
ure!!)]) . Each of the tiles has a different range of luminance 
values. Because the available tonal range for displaying 
the tiled image is too small to match the range of all 
tiles, some of the image details in the darker tiles are 
displayed in black. Nevertheless, a part of these details 
are rendered visible in the dynamic normalization layer 
at around 100 iterations (top row in figure O, implying 
that cell activities in this layer have less dynamic range 
than in the input. The compression effect is quantized in 
figure 1 101 where each curve represent the mean activity 
and the maximum activity, respectively, of all cells of one 
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FIG. 8: Time to convergence. Both of the graphs show 
simulation results of the dynamic normalization process with 
a chessboard image as input (size 128x128 pixels). For both 
graphs, "convergence" was defined as soon as the average ac- 
tivity of all cells in the dynamic normalization layer reached 
d — 0.25 (note that at full normalization d — 0.5). (a) Time 
to convergence of the dynamic normalization process depends 
in the first place on luminance contrast, and to a lesser extent 
on spatial frequency content of the input (legend: chessboard 
spatial frequency k in cycles per image) - curves for differ- 
ent spatial frequencies are similar. The simulation results are 
therefore consistent with equation I3UI Chessboard contrast 
was set to the values indicated on the abscissa: sy £ [0, B] 
with B € {10" 5 ,10~ 4 ,10~ 3 ,10~ 2 ,10 _1 ,10 }. (b) Mean ac- 
tivity of the dynamic normalization layers is indicated by col- 
ors (inset: colorbar) as a function of iterations (abscissa) and 
luminance contrast (ordinate) of the chessboard image (4 cy- 
cles per image). The dynamic of the normalization process 
reveals a sigmoidal behavior which consists of a plateau with 
low activity (top, turquoise), a relatively short rising phase 
(blue), and a plateau with high activity (pink, bottom), where 
convergence has occurred. 



compression to occur is that the global maximum prop- 
agates with finite speed in the max-diffusion layer, and 
that it is spatially separated from image structures that 
have less dynamic range (= local maxima). When the 
global maxima has not yet propagated to the local max- 
ima, then image structures are normalized by their "own" 
local maxima. Since normalization rescales all cell ac- 
tivities to the same target range (all image structures 
normalize to one), local normalization implies a reduc- 
tion of the dynamic range. However, local maxima are 
annihilated as the global maximum propagates, and im- 
age structures are now getting normalized by the global 
maximum. Then, the entire dynamic range of the input 
image is recovered in the normalization layer, and dy- 
namic range compression is abolished. The recovery of 
the original dynamic range can be seen when the entropy 
curves of figure I12f b) reduce to the entropy of the input 
image at w 1000 iterations (dashed horizontal line). 



C. Process entropy 

Figure [TW b) shows entropy as a function of time com- 
puted over the dynamic normalization layer. The entropy 
reaches a maximum in the time window where dynamic 
compression occurs. Notice that this maximum in en- 
tropy exceeds the entropy of the input image (dashed 
horizontal line). Because entropy quantifies the degree 
of flatness of a histogram (or probability distribution), 
the observed entropy maximum implies that cell activi- 
ties of the dynamic normalization layer are more homoge- 
neously distributed across the histogram than luminance 
values of the input. Figure [To! shows how the distribu- 
tion of activities evolve over time. Initially, cell activ- 
ities in the dynamic normalization layer are small, and 
tend to cluster around a single spot in the histogram (the 
cropped "hot spot" in the upper left corner of the his- 
togram) . Emanating from this "hot spot" , values start 
to occupy nearly the entire histogram. It is just then 
when an observer who is monitoring the output of the 
dynamic normalization network gathers the highest in- 
formation about the input image. 

In the consecutive part of the dynamic, the values are re- 
distributed again in a way that they concentrate around 
four principal stripes. These stripes correspond to the 
four contrast tiles. Therefore, dynamic range compres- 
sion is compatible with adaptation, since adaptation 
maximizes the transfer of information 18 1. 



D. Adaptation-by-entropy- maximisation 



the four tiles. The curves approach each other at around 
100 iterations. Thus, the output of the dynamic normal- 
ization network can be encoded with a smaller than the 
original numerical range. 

Figure [11] illustrates the mechanism which underlies dy- 
namic compression. A necessary condition for dynamic 



When computing the Shannon entropy of the output 
of the dynamic normalization network [17j ] , one observes 
an entropy maximum at the dynamic compression effect 
(figure [T121 and [T5]) . Hence, a straightforward algorithm 
for the adaptation of images is to stop the dynamic nor- 
malization process when an entropy maximum is reached 
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FIG. 9: Dynamic compression. Same as figure [6] but here for a "contrast-tiled" version of Peppers. Contrast-tiling means 
that the original image (entropy 8 bits, tiled image > 14 bits, see figure [l2j a)) was subdivided in quadrants ("tiles") to 
obtain the dynamic range of luminance values found in a typical outdoor scene (values were taken from 0, table 1). At an 
intermediate number of iterations (around 100), dynamic range compression occurs, where details in the darker tiles get better 
visible. Top row: Noise-free dynamic - compare with figure [lOl The input image is shown as inset in figure [TuT b). Middle 
row: Dynamic for additive Gaussian noise (see section IHIE 1[) with standard deviation a — 0.001 and zero mean - see also 
figure [T6)) . Obviously, moderate levels of noise improve dynamic compression and thus the visibility of the darker tiles. Bottom 
row: The tiles were disconnected from each other (i.e. no diffusion could take place across different contrast tiles), and the 
dynamic normalization mechanism now treats each tile as a separate image. Notice that in the first two rows all four tiles are 
connected (i.e., the tiles are treated as a single image), and activity propagates between tiles (as it is visible by a black shadow 
in the lower tiles at around 100 iterations). 



("one feedback loop"). To further enhance the dynamic 
compression effect, the output at the entropy maximum 
can be taken again as input to the dynamic normalization 
network, and once again we can let the dynamic normal- 
ization process continue until it reaches a maximum of 
entropy ("two feedback loops"). The entropy across 10 
feedback loops of the just described algorithm is illus- 
trated in figure [Ml with the curve designated by "process 
entropy" . Figure [T5] shows the output images obtained 
for one, two, three and 20 feedback loops. With increas- 
ing number of feedback loops, luminance information is 
suppressed, while contrasts are enhanced. At around 20 
loops, one obtains an image which seems to contain only 
contours, but iterating further enhances also noise and 
leaves one with an image without any recognizable struc- 
tures. Figure [14] shows that entropy decreases with in- 
creasing number of feedback loops (each data point is 
the entropy of the output at the indicated number of 
feedback loops). For the "tiled" and the "^i/i power" 
Peppers image, the entropy versus feedback loops has a 
maximum. Concluding, in terms of entropy, but also by 
visual inspection, a small number of feedback loops (one 
or two) seems optimal for the proposed adaptation algo- 
rithm. The algorithm should be understood as a "proof- 
of-concept" rather than a definite tool for image process- 
ing, because it occasionally develops artifacts. For exam- 
ple, future versions could address the suppression of the 



dark zone which emanates from the tiles of the " tiled 
Peppers image. 



E. Sensitivity of dynamic normalization for noise 

One may argue that an adaptive mechanism designed 
in a way suggested by dynamic normalization is highly 
sensitive to noise, because it is based on the computation 
of minimum and maximum operations. To address this 
issue, we further distinguish between static noise (i.e. an 
offset added to the input Sy which does not vary with 
time), and dynamic noise (i.e. an offset added to each 
layer which varies with time). For the first case we pre- 
sume the existence of a noise-free input pattern, to which 
static noise is added. A worst case scenario is on hand 
if a couple of cells Sij have high activities due to noise 
( "noisy cells" ) which lead to an undesired increase of the 
true dynamic range of the input. If a read-out mecha- 
nism for the dynamic normalization layer had only the 
same dynamic range as the input, then the noise would 
obscure the relevant information of the input at conver- 
gence. Nevertheless, if there were only a few noisy cells 
in the input layer, then the dynamic compression effect 
could mitigate the worst case scenario to some extent. 
To assess the robustness of dynamic normalization 
against temporally varying noise, numerical experiments 
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FIG. 10: Dynamic compression of input range. Each 
curve quantifies the activity across one of the four tiles (in- 
set) of the dynamic shown in the top row of figure[9] (a) mean 
activity, and (b) maximum of activity. Within the time win- 
dow where dynamic compression is seen, initially separated 
curves approach each other, and subsequently depart again. 
Other real- world images give similar results. 

were conducted with additive, normal-distributed noise 
("Gaussian noise"), with zero mean and standard devi- 
ation a. Apart, additional simulations were conducted 
with multiplicative, uniform noise ("white noise"). 

1. Additive, normal- distributed noise 

Temporally fluctuating normal-distributed noise £y (t) 
was added to the equations HH [20l [27l and the input s,j , 
according to 

Xij <— Xij + <r£ij(t). (33) 

In the last equation, stands for one of the variables 
dij, bij, Cij, and Sij, respectively, and "<— " means that 
the left hand side is replaced by the right hand side. The 
noise level is specified by a (assuming zero mean), and 
^ij(t) is assumed to be not correlated across time and/or 



spatial positions. A luminance step (32 x 32 pixels) was 
used as input, with luminance value zero on the dark side 
("black patch", columns 1 to 16), and 1 on the bright 
side ("white patch", columns 17 to 32). Thus, the mean 
activity of the noise free system should approach one at 
steady-state. We furthermore computed the Michelson 
contrast M. at each position according to 

Jy[.. — C 'jgwhitc ~ Qjgblack 

where j € black means 1 < j < 16 and j s white 
means 17 < j < 32 (the row index runs over all posi- 
tions 1 < i < 32). 

Figure [T7Ta) shows the temporal evolution of the mean 
activity of <Cij>ij£ white (i-e. averaged over white patch 
positions) for various noise levels a. Sufficiently high 
noise levels significantly affect the convergence behavior 
of dynamic normalization - the response plateau which is 
seen in the noise- free case is no longer reached. Instead of 
the plateau, a maximum is approached, the amplitude of 
which decreases with increasing noise level. Figure ITTT b) 
shows that a similar behavior is also seen for the aver- 
aged Michelson-contrast <Mij>: The contrast between 
the black and the white patch decreases with increasing 
noise level. This implies that image structures are ob- 
scured by noise. 

How does noise take influence on dynamic range com- 
pression? Three answers exist to this question, and they 
depend on the noise level. For relatively small noise levels 
(er < 0.001), no dramatic effect on dynamic compression 
is observed. For intermediate noise levels (a w 0.001), 
dynamic compression is enhanced (bottom row in fig- 
ure [51 and figure [To]) . Enhancement happens because the 
net effect of noise is to add an offset, which "lifts" the 
darker patches of the tiled Peppers image. For higher 
noise levels, however, the darker patches drown in noise 
and image details get lost. Consequently, if the goal 
of dynamic normalization was adaptation, then suitable 
chosen noise levels would aid to enhance range compres- 
sion, although this comes at the prize of reduced con- 
trasts in regions with low activities (dark quadrants in 
the tiled Peppers image). 

Notice that additive Gaussian noise can be easily coun- 
teracted by proposing additional mechanisms with low- 
pass characteristics, like spatial or temporal pooling of 
activity. Then, as long as the noise is not correlated over 
positions and time, it would simply average out. 

2. Multiplicative and normally distributed noise 

Multiplicative noise was applied to variables a^, bij, 
, and Sij , respectively, according to 

Xij <- ■ (1 + rj* ((Mj(t) - 1)) (35) 

with < (Aij (t) < 1 representing uniformly distributed 
noise which was uncorrelated across time and/or space. 



12 




FIG. 11: Understanding dynamic range compression. The dynamic compression effect is a consequence of that the global 
maximum propagates with finite speed in the max-diffusion layer. Thus, the smaller the initial region occupied by the global 
maximum in the max-diffusion layer, and the greater the distance of this region from other regions of smaller cell activities or 
local maxima, respectively, the longer the persistence of the effect. This is illustrated with two images (insets). The images 
consist of zeros except for small squares with different luminance values. The global maximum corresponds to the white square 
(s = 1, upper left in the images). The luminance values of the squares are the same as the maximum value of each quadrant in 
the tiled Peppers image of figure [101 that is s = 1, s = 0.24615, s — 0.04385, and s = 0.02231, respectively, (a) The squares are 
maximally separated, and the global maximum reaches the other local maxima relatively late (mean activities were computed 
across the same tiles as before with the tiled Peppers image). As a consequence, each square is independently normalized by 
its local maximum until it gets invaded by a higher activity value, (b) Moving the four squares closer to each other goes along 
with a shorter duration of the dynamic compression effect. Were all four squares moved into the center of the image such 
that they touch each other, virtually no dynamic compression effect would be revealed, because the global maximum would 
instantaneously propagate to all four tiles: all local maxima would get normalized by the global maximum right off. 



The noise level is specified by rj £ [0, 1] . Dynamic normal- 
ization is not significantly affected by this type of noise, 
not even for r\ = 1 (hence results are not shown). Mul- 
tiplicative noise acts differently on maxima and minima. 
Maximum activities can only decrease, but never increase 
beyond their value in the noise free case. Therefore, no 
spurious maxima are introduced into the max-diffusion 
layer by the type of multiplicative noise considered here. 
On the other hand, multiplicative noise can inject spuri- 
ous minima into the min-diffusion layer, if the lowest lu- 
minance value in the input image was bigger than zero. 
As the minimum luminance values of our images were 
always zero, they are consequently not affected by the 
multiplicative noise. 



IV. DISCUSSION 

A. Pseudo-diffusion and electrical synapses (gap 
junctions) 

The operator 1C\ models different types of electrical 
synapses (gap junctions). In its linear version, K,\=o de- 
scribes the exchange of both depolarizing (i.e. directed 
towards a neuron's firing threshold) and hyperpolarizing 
(i.e. directed away from a neuron's firing threshold) cur- 
rents between adjacent neurons. Networks of electrically 



coupled neurons are ubiquitous both in the retina (e.g. 
[H [H [H [H, HI) and the cortex (e.g. H US El)- 
These networks can be modeled by diffusion equations 
(e.g. [U [H, HiJ H|). Conversely, the operators de- 
fined by equation 1151 and 1161 represent models for rectify- 
ing (i.e. voltage sensitive) gap-junctions. Rectifyin g ga p 
junctions were described in the crayfish (e.g. [3ll . |32j]), 
and unidirectional and gated gap junctions were reported 
in the rat (e.g. (H) anc ^ turtle (e.g. [H, HI), respec- 
tively. 

In organisms, rectifying gap junctions may nevertheless 
be implemented in a "dirty" fashion. This means that a 
current flux may not strictly occur in only one direction. 
Rather, a small amount of current may as well flow in the 
opposite direction. Such behavior is captured by setting 
A to a finite value 1 <C |A| < 00, and was analyzed in 
figure O 

B. Computational aspects 

Substitution of two global memories (for the minimum 
and the maximum activity) by two pseudo-diffusion lay- 
ers of size N x N leads to a computationally more de- 
manding system, because more memory resources are 
needed and significantly more computational operations 
need to be carried out for their simulation. Moreover, be- 
cause computation of the global maximum or minimum is 
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FIG. 12: Dynamic compression, entropy and Gaus- 
sian noise, (a) "Entropy scan" of the tiled Peppers im- 
age. Shannon entropy [17| was computed as a function of 
the number of histogram bins. The curve starts to satu- 
rate at approximatively 10 20 = 1048576 bins (the maximum 
value allowed with the computer that was used for the simula- 
tions). Thus, the entropy of the tiled image is > 14 bits, (b) 
Shannon entropy of the dynamic normalization layer a func- 
tion of iterations. The tiled Peppers image served as input. 
Here, hardware constraints only permitted the computation 
of entropy with 15 ■ 10 5 bins. Each curve represents a dif- 
ferent amount of temporally varying and normal-distributed 
noise (additive "Gaussian noise") with standard deviations 
a G {0, 10 -5 , 10~ 4 , 10~ 3 } (see inset). The dynamic compres- 
sion effect is associated with plateau-like maxima in the en- 
tropy curves. The dashed horizontal line denotes the entropy 
of the input image. 



based on local, diffusion-like interactions, a maximum or 
a minimum does not propagate from one cell to another 
from one time step to the next. The diffusion rate can- 
not be chosen arbitrarily high to guarantee the numerical 
stability of the process. The time to convergence does 
not only depend on the pseudo-diffusion layers reaching 
a steady-state, but is mainly determined by the dynamic 
normalization layer. The number of iterations that is 
needed until convergence occurs scales with the numeri- 
cal range of the input. Thus, for small input values, the 



number of required iterations can be quite large (see fig- 
ure [5]). Therefore, the dynamic normalization network 
cannot be seriously considered as an alternative to an or- 
dinary normalization algorithm (i.e., searching the global 
maximum and minimum, and then rescaling). However, 
the dynamic normalization network can accomplish dif- 
ferent tasks which cannot be accomplished with an ordi- 
nary normalization algorithm, for example detection of 
contrast contours, or compression of the dynamic range 
of the input. 



V. CONCLUSIONS 

This paper introduced a parameterized diffusion oper- 
ator (parameter A) and analyzed some of its properties 
mathematically and by computer simulations. As a spe- 
cial case, heat diffusion is obtained for A = 0. Diffusion 
layers which are based on the two limit cases of the op- 
erator (for A — ► ±00) compute the global maximum and 
minimum, respectively, of the initial cell activities of the 
layer. This means that at convergence, all cells of the dif- 
fusion layers contain the same activity value - the max- 
imum (A — > 00) or the minimum (A — > —00). Based on 
these operators, a dynamic normalization network was 
defined (equation I27j) . Its steady-state solution is func- 
tionally equivalent to the ordinary rescaling of a set of 
numbers (equation [25]), but by making the normalization 
process dynamic, one observes two additional properties: 
contrast enhancement and dynamic range compression. 
Both effects occur because at first normalization acts lo- 
cally, similar to adaptation mechanisms. With increasing 
time, the normalization process gets continuously more 
global, until a steady-state is reached. The steady-state 
corresponds to a rescaling of the input in the dynamic 
normalization layer. 

By exploiting the dynamic compression effect, it should 
thus be possible to design a powerful adaptation mech- 
anism which maps an input image of an arbitrary nu- 
merical range to a smaller target range. To do so, the 
normalization process has to be "frozen" when dynamic 
compression occurs. As a first step into that direction, a 
simple adaptation algorithm based on the maximisation 
of entropy was proposed (section llll D|) : the dynamic is 
frozen as soon as a maximum of entropy is reached, and 
the output is then fed back as new input to the dynamic 
normalization network. As a further improvement, the 
diffusion operators could be modified such that activity 
exchange between two cells is blocked for sufficiently large 
activity gradients [35J , Doing so would possibly prevent 
in figure [9] (first and second row) the global maximum 
from spreading between tiles, and would normalize each 
tile independently such that ideally a dynamic similar to 
the bottom row in figure [S] is produced. 
Systems based on pseudo-diffusion have already turned 
out to be of utility for a variety of purposes in image 
processing (for implementing filling-in mechanisms, or 
winner-takes all inhibition, see [35]). Pseudo-diffusion 
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FIG. 13: Histogram evolution of the tiled Peppers image. Each horizontal line in the above image represents a histogram 
of the dynamic normalization layer (abscissa: 1024 bins per line) at a different iteration number (ordinate). Occurrence 
frequencies of output activity values are represented by colors (inset: colorbar). 

Initially, cells in the dynamic normalization layer have small activities and concentrate in the first histogram bins (upper 
left corner; the first 14 time steps were dropped for visualization reasons). This highly predictable state is associated with 
low entropy. Immediately afterwards, values distribute themselves homogeneously over virtually all bins, and thus entropy 
increases. This is when dynamic compression occurs: an observer who is monitoring the output of the dynamic normalization 
network gathers the highest information about the input image. In other words, dynamic range compression is compatible with 
adaptation, since adaptation maximizes the transfer of information pi| . Subsequently values are redistributed again in a way 
that they concentrate along four principal stripes reflecting the four contrast tiles. This state is again associated with a lower 
entropy. Notice that both the way from and to the more homogeneous distribution of values is mirrored in structures similar 
to faint "trajectories" that sweep from left to right across the histogram. 



systems can generally be used for implementing the max- 
operation without the need for^globally acting pooling 
units (see for example [36[ and [37]]). The advantage over 
functionally equivalent but hardwired systems is that the 
region where normalization takes place can be dynami- 
cally adjusted. Furthermore, the maximum operation 
serves to implement invariance properties in models for 
object recognition (e.g. [38l l39l| ) . 
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APPENDIX A: MATERIAL AND METHODS 

All simulations were carried out using the Matlab en- 
vironment (R2006b) on a Linux workstation, where both 
native Matlab code and mex-files programmed in CH — f 
were used. Diffusion operators were normalized by the 
number of adjacent cells (normally four, along the do- 
main boundaries three, and in the corners two). Nor- 
mally, the equations describing the diffusion layers (eqs. 
[T9l and [20)) . and dynamic normalization (eq. |27|) turned 
out to be numerically stable such that a forward-time- 
centered-space (FTCS) Euler scheme with step size one is 
sufficient. (Here, we understand numerical stability such 
that the solution converges rather than growing in an 
unbounded fashion). Notice, however, the stability cri- 
terion associated with the FTCS-integration of the heat 
diffusion equation D ■ At < 1/2 (assuming grid spacing 
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FIG. 14: Adaptation by entropy maximisation (en- 
tropy). Each point of the curves "original", "4th power", 
and "tiled" corresponds to the entropy H (normalized by the 
entropy of each respective original image ^/original) at the in- 
dicated number of feedback loops of the entropy maximizing 
adaptation algorithm (output images at one, two, three and 
20 loops are shown in the previous figure). In addition, the 
curve designated by "process entropy" relates the relative en- 
tropy of the full algorithm to the data points of the "tiled" 
Peppers image: between any two data points, the dynamic 
normalization network was run by using the output image at 
the previous data point as input until a maximum in entropy 
has been reached. See text for further details. 



one, see section 19.2 in |40|) where D is the diffusion coef- 
ficient, and At is the integration step size. Since we com- 
pared pseudo-diffusion with Laplacian or heat diffusion 
(eq. [17]), by default we employed Euler's method with 
integration step size At = 0.5 and diffusion coefficient 
D = 1. Exceptions are as follows. Figure[5]was simulated 
with At = 0.1 and At = 0.001, respectively. Figure El 
IH and figures [9] to [16] were integrated with the fourth- 
order Runge-Kutta method (At = 0.5, D = 1). For the 
compilation of figure 1171 again the forth-order Runge- 
Kutta method was used with At = 0.01 and D = 1/At, 
to guarantee numerical stability in the presence of high 
noise levels. 

It should be emphasized that the results presented in 
this paper do not depend critically on the exact value of 
neither At and D, nor on the specific choice of the inte- 
gration method. Variation of these parameters leads to 
a corresponding rescaling of the time axis. Although we 
exemplified the behavior of the model only by means of 
two standard images which are commonly used for image 
processing (Lena and p eppers ), all characteristics of the 
model can as well be reproduced with other images. 
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FIG. 15: Adaptation by entropy maximisation (re- 
sults). The first column shows the original images which were 
used as input to the adaptation-by-entropy-maximisation al- 
gorithm: "original" is the original pepper image; "4th power" 
is the original pepper images with luminance values elevated 
by forth power (in this way a high-dynamic-range image is 
created); tiled is the tiled peppers image (cf. figure [9} . The 
numbers designate how many feedback loops of the algorithm 
were run. 



APPENDIX B: PROOF OF EQUATION [TOl (FOR 

A -> oo AND A = 0) 

Consider the derivative of the operator T\[-] (equa- 
tionHJ, 



dT x [z] 
dz 



n 



r/Xze 



-As 



3 — Xz 



(1 + e 

term II 

where the following three cases have to be analyzed: 
Case A = 0. In this case 77 = 2 from equation^ and 
d%[z] V 



(Bl) 



dz 




(B2) 



Thus, for A = the derivative is constant one for 
all z, and equation [7] reduces to the linear diffusion 
equation [6] 

Case A — > +00. In this case 77 = 1 from equation^ and 
we have to consider three additional cases accord- 
ing to the value of z. Note that z is treated as a 
constant. Hence, 



0. 



lim 



dT x [z] 

dz 




(B3) 



This is to say that if the gradient z vanishes, 
then the derivative is constant with value 1/2. 
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z > 0. We start with evaluating term I of equa- 
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= r) = l (term I). (B4) 



FIG. 16: Dynamic compression in the presence of 
noise. Same as figure [TU] but here with additive and tempo- 
rally varying Gaussian noise (standard deviation a — 0.001, 
zero mean). Snapshots of the noisy dynamic are shown in 
the middle row of figure [9] Evidently, suitable chosen noise 
levels can enhance the dynamic compression effect. See also 
figure [T^Jb) for the dependence of entropy on noise levels. 



In the numerator of term II appears a prod- 
uct of the kind "oo • 0" . One may argue that 
the exponential exp(— A|z|) always approaches 
zero much more faster than the term r/X\z\ 
is able grow (or one may equivalently apply 
1'HospitaPs rule to this product by applying 
d/d\ on each factor), 



■m <"^M^)„ Hm m±W^!l =0 {tennII) . (B5) 



Thus, for limA^+oo evaluates equation IB 1 1 to 
1 for all z > 0. 



Evaluating term II (again there is a little more 
work to do), 



z < 0. Evaluating term I, 
V 



lim — 

A^ + oo 1 







(term I). 



(B6) 
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FIG. 17: Dynamic normalization with additive Gaus- 
sian noise. A luminance step (activities zero and one) was 
used to analyze the behavior of dynamic normalization net- 
work in the presence of additive and temporally varying Gaus- 
sian noise (zero mean, standard deviation a as indicated in 
the plot), (a) Activity of dynamic normalization averaged 
over the cells corresponding to the white region of the lumi- 
nance step, (b) Michelson contrast between the black and the 
white region of the step, averaged over respective positions. 
See text for further details. 



lim r . , ^ lim — ^r-P- — ^=0 (term II). (B7) 



Hence, for Hitia^+oo evaluates equation [BT] to Summarizing the above we saw that equation IB II be- 

for all z < 0. haves approximately [4l| like a Heaviside function H for 

lim^+oo, thus equation \W\ is proofed. The proof of 
equation [TT1 (for A — ► oo) proceeds in straight analogy. 
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